Multi-accent deep neural network acoustic model with accent-specific top layer using the KLD-regularized model adaptation
نویسندگان
چکیده
We propose a multi-accent deep neural network acoustic model with an accent-specific top layer and shared bottom hidden layers. The accent-specific top layer is used to model the distinct accent specific patterns. The shared bottom hidden layers allow maximum knowledge sharing between the native and the accent models. This design is particularly attractive when considering deploying such a system to a live speech service due to its computational efficiency. We applied the KL-divergence (KLD) regularized model adaptation to train the accent-specific top layer. On the mobile short message dictation task (SMD), with 1K, 10K, and 100K British or Indian accent adaptation utterances, the proposed approach achieves 18.1%, 26.0%, and 28.5% or 16.1%, 25.4%, and 30.6% word error rate reduction (WERR) for the British and the Indian accent respectively against a baseline cross entropy (CE) model trained from 400 hour data. On the 100K utterance accent adaptation setup, comparable performance gain can be obtained against a baseline CE model trained with 2000 hour data. We observe smaller yet significant WER reduction on a baseline model trained using the MMI sequence-level criterion.
منابع مشابه
Improving deep neural networks based multi-accent Mandarin speech recognition using i-vectors and accent-specific top layer
In this paper, we propose a method that use i-vectors and model adaptation techniques to improve the performance of deep neural networks(DNNs) based multi-accent Mandarin speech recognition. I-vectors which are speaker-specific features have been proved to be effective when used in accent identification. They can be used in company with conventional spectral features as the input features of DN...
متن کاملEmpirical Evaluation of Speaker Adaptation on DNN based Acoustic Model
Speaker adaptation aims to estimate a speaker specific acoustic model from a speaker independent one to minimize the mismatch between the training and testing conditions arisen from speaker variabilities. A variety of neural network adaptation methods have been proposed since deep learning models have become the main stream. But there still lacks an experimental comparison between different met...
متن کاملRegularized sequence-level deep neural network model adaptation
We propose a regularized sequence-level (SEQ) deep neural network (DNN) model adaptation methodology as an extension of the previous KL-divergence regularized cross-entropy (CE) adaptation [1]. In this approach, the negative KL-divergence between the baseline and the adapted model is added to the maximum mutual information (MMI) as regularization in the sequence-level adaptation. We compared ei...
متن کاملAcoustic model selection for recognition of regional accented speech
Accent is cited as an issue for speech recognition systems [1]. Research has shown that accent mismatch between the training and the test data will result in significant accuracy reduction in Automatic Speech Recognition (ASR) systems. Using HMM based ASR trained on a standard English accent, our study shows that the error rates can be up to seven times higher for accented speech, than for stan...
متن کاملModelling Accents for Automatic Speech Recognition
Accent is cited as an issue for speech recognition systems. If they are to be widely deployed, Automatic Speech Recognition (ASR) systems must deliver consistently high performance across user populations. Hence the development of accentrobust ASR is of significant importance. This research investigates techniques for compensating for the effects of accents on performance of Hidden Markov Model...
متن کامل